User-Relevant Access to Textual Information through Flexible Identification of Terms: A Semi-Automatic Method and Software Based on a Combination of N-Grams and Surface Linguistic Filters

نویسندگان

  • Ismaïl Biskri
  • Sylvain Delisle
چکیده

We present a semi-automatic method and software tool for multi-word term identification. Our approach is hybrid in that it combines numeric computations (N-grams) to linguistic filters. The software tool is different from most other term identification tools in that is it by design semi-automatic: i.e. it is interactive and constantly under the user’s control. The software supports the knowledge engineer’s work, the (corpus) domain’s expert, or the linguist, by helping them do their job more efficiently. We justify this semi-automatic approach by the need to have a more flexible and customisable tool to perform certain term identification tasks. More specifically, in some applications we want to allow the user’s perspective, knowledge and subjectivity, influence the results: all this within certain limits, of course. An example of such an application on which we are currently working is that of Web personalisation: to allow individuals to develop their own vision of information universes of interest to them, we need flexible and customisable tools that can support them in such a challenging task, not tools that will impose on them a pseudo-standardised vision of the world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Context-Aware Recommender Systems: A Review of the Structure Research

 Recommender systems are a branch of retrieval systems and information matching, which through identifying the interests and requires of the user, help the users achieve the desired information or service through a massive selection of choices. In recent years, the recommender systems apply describing information in the terms of the user, such as location, time, and task, in order to produce re...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Identifying and Ranking the Important Textual and Paratextual Elements in Fiction Retrieval

Purpose: The purpose of this study is to identify the textual and paratextual elements in retrieving fiction from the readers’ perspective in order to provide the most appropriate access points for the readers and to improve access to fictions based on the readers’ needs. Method: The current research is an applied study in terms of purpose, applying a mixed method that was conducted using the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000